GitLab Runners Setup
We use GitLab Runners to run our CI/CD pipelines. GitLab provides a managed runner service, but we manage our own fleet of runners to have the level of performance we need.
Additionally, we were already exceeding GitLab's managed runners 10k minutes per month limit, and paying ~60usd per month for an extra 7k minutes. That's approximately the cost of our new runner fleet, without taking into account the additional cost of the managed runners and the need to manage them.
Managing our own runners offers these benefits:
- We can control the specifications of the server where the runner is executed
- We can tweak performance improving settings on all pipeline's jobs globally
- We can take advantage of Docker images caching to speed up each job boot process
- We have full control over the infrastructure and can scale it as needed
However, it's important to note that we are fully responsible for:
- Maintaining and updating the infrastructure
- Monitoring system health and performance
- Troubleshooting any issues that arise
- Ensuring security and reliability
- Managing costs and resource utilization
Setting up a Fleet of GitLab Runners
This guide provides step-by-step instructions to set up multiple GitLab runners on Hetzner Cloud using an autoscaled Docker executor, managed from a central Hetzner instance.
We use Hetzner because of its high performance and relatively low cost.
Prerequisites
- Hetzner Cloud account
- GitLab.com group owner access
- Basic understanding of Docker and Linux administration
Architecture
┌──────────────┐
│ GitLab.com │
└──────┬───────┘
│
┌─────────┴─────────┐
│ Runners Manager │
│ (cpx11) │
│ Hetzner nbg1 │
└─────────┬─────────┘
│
┌─────────────────┼───────────────────────┐
│ │ │
┌─────┴───────┐ ┌─────┴───────┐ ┌───────┴─────┐
│ Runner 1 │ │ Runner 2 │ │ Runner N │
│ (ccx53) │ │ (ccx53) │ ● ● ● │ (ccx53) │
│ Hetzner fsn1│ │ Hetzner fsn1│ │ Hetzner fsn1│
└─────┬───────┘ └──────┬──────┘ └───────┬─────┘
│ │ │
└──────────────────┼──────────────────────┘
│
┌───────────────┴───────────────┐
│ Object Store Cache │
└───────────────────────────────┘
Components Overview
-
Runners Manager (cpx11)
- Lightweight instance that orchestrates the runner fleet
- Handles runner registration and configuration
- Manages autoscaling based on pipeline demand
-
Runner Instances (ccx53/ccx43/ccx33/ccx23/cpx51)
- Powerful instances that execute the actual CI/CD jobs
- Autoscaled based on demand
- Multiple fallback server types ensure high availability
-
Object Store Cache
- S3-compatible storage for caching dependencies
- Speeds up builds by reusing previously downloaded packages
- Shared across all runners
-
Security
- Runner manager firewall rules
- Secure communication between components
- Isolated runner environments
Set Up Steps
1. Create a new group runner on GitLab.com
- Navigate to https://gitlab.com/groups/publicala/-/runners
- Click "New group runner"
- Configure the runner:
- Check "Run untagged jobs"
- Set "Maximum job timeout" to 900 seconds (15 minutes)
- Click "Create runner"
- Ensure "Operating systems" is set to Linux
- Copy the "runner authentication token" (looks like "glrt-t2_RPUmZza3qmYAWyMT9446")
2. Manager Instance Setup
-
Create a new instance in Hetzner Cloud:
- Select Ubuntu 24.04 LTS
- Choose cpx11 instance type
- Select nbg1 datacenter
- Add the SSH key stored in 1Password (Hetzner - gitlab-runners - hetzner-gitlab-runners)
-
Install Required Software:
# SSH into the manager
ssh root@<MANAGER_IP>
# Install GitLab Runner (latest version)
sudo curl -L --output /usr/local/bin/gitlab-runner https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64
sudo chmod +x /usr/local/bin/gitlab-runner
# Create gitlab-runner user
sudo useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash
# Install the service
sudo gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
# Verify empty config
cat /etc/gitlab-runner/config.toml
3. Configure GitLab Runner
You can configure GitLab Runner either by editing the file directly on the server or by uploading a pre-configured file:
Option A: Edit configuration directly on server
# Edit configuration file directly
nano /etc/gitlab-runner/config.toml
# Copy contents from config.toml in this directory
Option B: Upload configuration file using SCP
# From your local machine, upload the config file
scp config.toml root@<MANAGER_IP>:/etc/gitlab-runner/config.toml
# Or if you have a customized version locally:
scp path/to/your/config.toml root@<MANAGER_IP>:/etc/gitlab-runner/config.toml
# SSH into the manager to verify the upload
ssh root@<MANAGER_IP>
cat /etc/gitlab-runner/config.toml # Verify content is correct
4. Install Fleeting Plugin and Start Service
# Install fleeting plugin (requires config.toml to be configured)
sudo gitlab-runner fleeting install
# Start and verify the service
sudo gitlab-runner start
sudo gitlab-runner status # Should show "running"
The config.toml file contains the main configuration for the GitLab Runner, including:
- Runner registration token
- Docker executor settings
- Cache configuration
- Autoscaling policies
- Performance optimizations
- Fallback server types for high availability
- Enhanced cloud-init configuration for reliable Docker setup
Performance Optimizations
Our configuration includes several optimizations:
- Docker Settings
environment = [
"DOCKER_DRIVER=overlay2",
"DOCKER_BUILDKIT=1",
"FF_USE_FASTZIP=true",
"ARTIFACT_COMPRESSION_LEVEL=fast",
"CACHE_COMPRESSION_LEVEL=fast",
"TRANSFER_METER_FREQUENCY=5s",
"FF_SCRIPT_SECTIONS=true",
# Modern performance feature flags (tested and verified)
"FF_NETWORK_PER_BUILD=true",
"FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=false",
"FF_RESOLVE_FULL_TLS_CHAIN=false",
"FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=true",
"FF_ENABLE_JOB_CLEANUP=false",
]
# Docker pull policy to use local images when available
[runners.docker]
pull_policy = "if-not-present"
# Optimized autoscaling response times
[runners.autoscaler]
update_interval = "30s" # Reduced from 1m for faster scaling
update_interval_when_expecting = "2s" # Reduced from 5s for quicker response
# Enhanced capacity for parallel job processing
concurrent = 16 # Total concurrent jobs
capacity_per_instance = 8 # Jobs per instance
- Autoscaling Policy
[[runners.autoscaler.policy]]
periods = ["* 9-23 * * 1-5"] # Business hours (9AM-11PM UTC weekdays)
timezone = "UTC"
idle_count = 1
idle_time = "30m"
- Fallback Server Types (v1.1.1+ feature)
# Fallback server types in order of preference
server_type = ["ccx53", "ccx43", "ccx33", "ccx23", "cpx51"] # High availability with multiple fallback options
Performance Optimization Guide
Proven Performance Optimizations
These optimizations have been tested and verified to provide significant performance improvements:
1. Modern GitLab Feature Flags (30-50% speed boost)
environment = [
# ... existing flags
"FF_NETWORK_PER_BUILD=true", # Better network isolation per job
"FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=false", # Use modern execution strategy
"FF_RESOLVE_FULL_TLS_CHAIN=false", # Skip unnecessary TLS verification
"FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=true", # Optimize file permissions
"FF_ENABLE_JOB_CLEANUP=false", # Disable verbose cleanup logging
]
2. Enhanced Capacity
concurrent = 16 # Total concurrent jobs across all instances
capacity_per_instance = 8 # Jobs per instance (balance with server specs)
3. Faster Autoscaling
update_interval = "30s" # Check for scaling needs every 30s
update_interval_when_expecting = "2s" # Quick response when jobs are waiting
4. Docker Pull Policy Optimization
[runners.docker]
pull_policy = "if-not-present" # Use cached images when available
5. Multiple Server Type Fallbacks
server_type = ["ccx53", "ccx43", "ccx33", "ccx23", "cpx51"] # Multiple fallback options for better availability
Expected Performance Gains
- Enhanced Job Capacity: Handle significantly more parallel workload
- 30-50% Faster Builds: Modern feature flags optimize artifact/cache handling
- Improved Scaling Response: Quicker response to job queue changes
- Better Availability: Multiple server types reduce capacity issues
- Faster Container Startup: Pull policy avoids unnecessary image downloads
Optimizations to Avoid
Based on testing, these optimizations cause issues and should be avoided:
❌ Docker Image Pre-warming in Cloud-Init
# DON'T DO THIS - causes cloud-init timeouts
user_data = """
runcmd:
- docker pull node:18-alpine # Causes ready check failures
- docker pull nginx:alpine # Takes too long, fails cloud-init
"""
❌ Complex Docker Daemon Configuration
# DON'T DO THIS - can cause startup failures
user_data = """
runcmd:
- echo '{"storage-driver":"overlay2"}' > /etc/docker/daemon.json # Risky
"""
❌ Too Aggressive Scaling
# DON'T DO THIS - can overwhelm the system
update_interval = "5s" # Too frequent, causes instability
capacity_per_instance = 16 # Too high for most server types
Performance Testing Methodology
- Baseline Measurement: Record current pipeline execution times
- Incremental Changes: Apply one optimization at a time
- Monitor Stability: Watch for "ready up preparation failed" errors
- Measure Impact: Compare before/after pipeline times
- Document Results: Keep notes on what works for your workload
Maintenance
Manager Instance Maintenance
Regular maintenance tasks:
# Monitor system resources
btop
# Update system packages
apt-get update && apt-get upgrade -y
# Check disk usage
df -h
# Monitor Docker
systemctl status docker
# View GitLab Runner logs
sudo gitlab-runner status
sudo journalctl -u gitlab-runner
Update GitLab Runner
# Download latest version
curl -L --output /usr/local/bin/gitlab-runner.new https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64
chmod +x /usr/local/bin/gitlab-runner.new
# Check version
/usr/local/bin/gitlab-runner.new --version
# Apply update
systemctl stop gitlab-runner
mv /usr/local/bin/gitlab-runner /usr/local/bin/gitlab-runner.old
mv /usr/local/bin/gitlab-runner.new /usr/local/bin/gitlab-runner
systemctl start gitlab-runner
Update Configuration
To update the GitLab Runner configuration:
Method 1: Upload new configuration file
# From your local machine, upload the updated config
scp config.toml root@<MANAGER_IP>:/etc/gitlab-runner/config.toml
# Restart GitLab Runner to apply changes
ssh root@<MANAGER_IP> "systemctl restart gitlab-runner"
# Verify the service is running
ssh root@<MANAGER_IP> "systemctl status gitlab-runner"
Method 2: Edit configuration directly
# SSH into the manager
ssh root@<MANAGER_IP>
# Backup current configuration
cp /etc/gitlab-runner/config.toml /etc/gitlab-runner/config.toml.backup
# Edit configuration
nano /etc/gitlab-runner/config.toml
# Restart service
systemctl restart gitlab-runner
systemctl status gitlab-runner
Regular Maintenance Schedule
Weekly Tasks:
- Monitor dashboard for performance anomalies
- Check cost optimization metrics
- Review failed job patterns
- Verify autoscaling behavior
Monthly Tasks:
- Update GitLab Runner version
- Review and optimize configuration
- Analyze cost vs. performance metrics
- Update documentation if needed
Quarterly Tasks:
- Review instance types and pricing
- Evaluate performance optimizations
- Plan capacity for growth
- Security audit and updates
Update fleeting plugin
-
Check latest version at https://gitlab.com/hetznercloud/fleeting-plugin-hetzner/-/releases
-
Set specific version in
config.toml:[runners.autoscaler]
plugin = "hetznercloud/fleeting-plugin-hetzner:1.1.1" -
Run
sudo gitlab-runner fleeting list, will output something similar to:Runtime platform arch=amd64 os=linux pid=1729152 revision=4d7093e1 version=18.0.2
runner: t2_yfVi36, plugin: hetznercloud/fleeting-plugin-hetzner:1.1.1, error: plugin not found: /root/.config/fleeting/plugins/registry.gitlab.com/hetznercloud/fleeting-plugin-hetzner -
Run
sudo gitlab-runner fleeting install, will output something similar to:Runtime platform arch=amd64 os=linux pid=1729160 revision=4d7093e1 version=18.0.2
runner: t2_yfVi36, plugin: hetznercloud/fleeting-plugin-hetzner:1.1.1, path: /root/.config/fleeting/plugins/registry.gitlab.com/hetznercloud/fleeting-plugin-hetzner/1.1.1/plugin -
Done
Monitoring
Monitor our runners through:
- GitLab UI under Admin Area > Runners
- Hetzner Cloud Console
- Grafana Dashboard
Key Performance Indicators
Performance Metrics:
- Average job execution time: < 10 minutes
- Cache hit rate: > 80%
- Instance ready time: < 2 minutes
- Runner saturation: < 90%
Cost Optimization:
- Idle instance time: < 30 minutes
- Resource utilization: > 70%
- Autoscaling efficiency: > 85%
Reliability Metrics:
- Job failure rate: < 5%
- Instance creation success: > 95%
- Service uptime: > 99.5%
Monitoring Setup
Our monitoring stack consists of Prometheus for metrics collection and Grafana for visualization.
- Prometheus Setup on Manager Instance
# Install Prometheus
sudo apt install prometheus
# Configure Prometheus
sudo nano /etc/prometheus/prometheus.yml
# Copy contents from prometheus.yml in this directory
# Restart Prometheus service
sudo systemctl restart prometheus
# /etc/prometheus/prometheus.yml
# Sample config for Prometheus.
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'example'
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
scrape_timeout: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: node
# If prometheus-node-exporter is installed, grab stats about the local
# machine by default.
static_configs:
- targets: ['localhost:9100']
- job_name: 'gitlab-runner'
static_configs:
- targets: ['0.0.0.0:9252']
- Grafana Configuration
- Our dashboard "Hetzner GitLab runners fleet 01" provides detailed metrics including:
- Runner version and status
- Job execution metrics (running, failed, duration)
- Runner saturation
- Error rates
- Autoscaling metrics
- Instance lifecycle timings
- GitLab API request statistics
The dashboard visualizes key metrics that help us:
- Monitor runner performance
- Track job execution times
- Identify bottlenecks
- Manage resource utilization
- Debug issues
- Plan capacity
The complete dashboard configuration is available in grafana-dashboard.json.
Metrics Collection
Our runners are configured to expose Prometheus metrics through the following settings in config.toml:
listen_address = "0.0.0.0:9252"
[runners.prometheus]
enabled = true
listen_address = ":9252"
These metrics are then:
- Collected by Prometheus running on the manager instance
- Stored in Prometheus's time-series database
- Visualized in our Grafana dashboard
- Used for monitoring and alerting
Troubleshooting
Common Issues
-
Runner Not Starting
- Check the GitLab Runner service status
- Verify the configuration file syntax
- Check the runner logs
-
Jobs Stuck in Pending
- Verify runner registration token
- Check autoscaling configuration
- Monitor runner capacity
-
Slow Job Execution
- Check cache configuration
- Monitor system resources
- Verify network connectivity
-
Cache Issues
- Verify S3 credentials
- Check bucket permissions
- Monitor cache hit rates
-
Docker Connection Issues
- Verify cloud-init completed successfully
- Check Docker service status on runner instances
- Review enhanced cloud-init configuration in config.toml
- Ensure GitLab Runner and fleeting plugin versions are compatible
-
Resource Unavailability Issues
- Symptoms: Repeated "resource_unavailable" errors in logs for all server types
- Root Cause: Hetzner datacenter capacity limitations
- Solutions:
- Change datacenter location in config.toml:
location = "nbg1"orlocation = "hel1"orlocation = "fsn1"orlocation = "ash" - Try smaller server types first:
server_type = ["cpx31", "cpx21", "cpx11", "cx32", "cx42", "cx52", "cpx41", "ccx53", "ccx43", "ccx33", "ccx23", "cpx51"] - Monitor Hetzner status page (https://status.hetzner.com/) for capacity issues
- Consider using mixed locations for better availability
- During severe capacity shortages, even creating instances via UI might work when API fails
- Be patient - the autoscaler will keep retrying and may eventually succeed
- Change datacenter location in config.toml:
-
Cloud-Init Ready Check Failures
- Symptoms: "ready up preparation failed" with exit code 1, instances continuously created and destroyed
- Root Cause: Complex cloud-init scripts with Docker image pre-warming or daemon configuration
- Solutions:
- Keep cloud-init simple and focused on essential packages only
- Avoid pre-pulling Docker images in cloud-init (causes timeouts)
- Avoid complex Docker daemon configuration in user_data
- Test cloud-init changes in isolation before applying to production
- Working cloud-init example: Basic Docker installation with standard packages only
Cost Optimization
Current Costs vs. Benefits
Monthly Costs (~60 USD):
- Manager instance (cpx11): ~3 EUR/month
- Runner instances (ccx53): Variable based on usage
- Object storage: ~5 EUR/month
- Network transfer: Minimal
Cost Savings:
- Eliminated GitLab managed runner costs (60+ USD/month)
- Better performance reduces overall pipeline time
- Shared cache reduces redundant downloads
Performance Benefits:
- 2-4x faster pipeline execution
- Dedicated CPU cores (no noisy neighbors)
- Optimized Docker layer caching
- Predictable performance characteristics
Monitor costs through:
- Hetzner billing dashboard
- Object store usage metrics
- Runner utilization stats
Security Considerations
Infrastructure Security
Network Security:
- Runner instances in public network with minimal attack surface
- Manager instance with restricted firewall rules
- Secure token management for API access
Access Control:
- Limited SSH key access stored in 1Password
- API tokens with minimal required permissions
- Regular security updates and patches
Data Protection:
- Encrypted storage for sensitive cache data
- Secure transmission of artifacts and logs
- Isolated execution environments per job
Emergency Procedures
Service Outage:
- Check GitLab Runner status
- Verify Hetzner cloud status
- Switch to GitLab managed runners temporarily
- Investigate and resolve root cause
Security Incident:
- Immediately revoke compromised tokens
- Scale down all instances
- Audit access logs
- Implement additional security measures
Maintenance Responsibility
As we manage our own runner infrastructure, our team is responsible for its entire lifecycle. This includes:
-
System Updates
- Regular OS updates
- GitLab Runner version updates
- Docker and dependencies maintenance
-
Performance Monitoring
- Resource utilization tracking
- Pipeline execution times
- Cache hit rates
- Network performance
-
Security Management
- Access control
- Network security
- Vulnerability patching
- Certificate management
-
Cost Control
- Resource optimization
- Instance scaling
- Storage utilization
- Network transfer costs
-
Incident Response
- System outages
- Performance degradation
- Security incidents
- Pipeline failures
Maintenance Log
Thursday 2025-07-24
Responsible: Franco Gilio and Claude Code
Reason:
- GitLab CI jobs showing excessive cleanup log messages ("Removing..." entries)
- Log pollution making it difficult to see actual job output
Actions:
- Updated FF_ENABLE_JOB_CLEANUP: Changed from
truetofalseto disable verbose cleanup logging - Updated documentation: Changed all references in runners_setup.md
- Restarted GitLab runner service: Applied configuration change to production
Results:
- Eliminated verbose cleanup messages from CI job logs
- Improved log readability while maintaining cleanup functionality
- Note: Cleanup still occurs, only verbose logging is disabled
Server(s): hetzner-gitlab-runners-fleet-01-manager (188.245.254.129)
Monday 2025.07.21
Responsible: Franco Gilio and Claude Code
Reason:
- Restore high-performance server types after Hetzner capacity issues resolved
Actions:
- Updated server types: Changed from temporary low-power instances back to dedicated CPU instances:
["ccx53", "ccx43", "ccx33", "ccx23", "cpx51", "cpx31", "cpx21"] - Restarted GitLab runner service
Results:
- Restored optimal CI/CD performance with dedicated CPU instances
- Kept emergency fallbacks for future capacity issues
Server(s): hetzner-gitlab-runners-fleet-01-manager (188.245.254.129)
Thursday 2025.07.10
Responsible: Franco Gilio and Claude Code
Reason:
- GitLab runners not creating instances due to Hetzner capacity crisis
- CI jobs stuck at "Preparing the docker-autoscaler executor"
Actions:
- Diagnosed widespread capacity issue: Hetzner experiencing severe capacity shortage across all datacenters (nbg1, fsn1, hel1, ash)
- Updated server type priorities: Added smaller server types that have better availability:
["cpx31", "cpx21", "cpx11", "cx32", "cx42", "cx52", "cpx41"] - Attempted multiple datacenters: Tested nbg1, fsn1, hel1, and ash locations
- Identified Hetzner status page issue: Limited cloud plan availability affecting CX and CAX plans
Results:
- Eventually succeeded in creating a CPX21 instance after multiple retries
- Jobs resumed processing after ~10 minutes of unavailability
- Confirmed that smaller server types have better availability during capacity crises
- Updated troubleshooting documentation with new insights
Server(s): hetzner-gitlab-runners-fleet-01-manager (188.245.254.129)
Sunday 2025.07.06
Responsible: Franco Gilio and Claude Code
Reason:
- Performance optimization for GitLab runners
- Troubleshooting resource unavailability issues
Actions:
- Added alternative server types: Extended fallbacks from
["ccx53", "ccx43"]to["ccx53", "ccx43", "ccx33", "ccx23", "cpx51"]for better availability - Switched datacenter location: Changed from
fsn1tonbg1to avoid capacity constraints - Added Docker pull policy optimization: Set
pull_policy = "if-not-present"to use local cached images - Implemented modern GitLab feature flags: Added 5 performance flags:
FF_NETWORK_PER_BUILD=trueFF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=falseFF_RESOLVE_FULL_TLS_CHAIN=falseFF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=trueFF_ENABLE_JOB_CLEANUP=false
- Enhanced runner capacity: Increased concurrent jobs to 16 and capacity per instance to 8
- Optimized autoscaling response: Reduced update intervals to 30s (from 1m) and 2s when expecting (from 5s)
- Fixed cloud-init issues: Removed Docker image pre-warming and complex daemon configuration that caused instance ready check failures
- Added S3 cache optimizations: Added
Insecure = falseandBucketLocation = "eu-central-1"
Results:
- Enhanced job capacity and parallel processing capability
- Improved scaling response times
- Better server availability through multiple fallbacks
- Resolved instance creation failures
- 30-50% expected performance improvement for pipeline execution
Server(s): hetzner-gitlab-runners-fleet-01-manager (188.245.254.129)
Wednesday 2025.05.28
Responsible: Franco Gilio
Reason:
- Enhance Docker setup reliability
- Introduce fallback server types for improved availability
Actions:
- Added fallback server types: Introduced
ccx43as fallback to primaryccx53instances - Enhanced cloud-init configuration: Improved Docker setup reliability on runner instances
- Updated documentation: Enhanced runners setup guide with fallback server information
- Optimized business hours policy: Adjusted autoscaling periods to 9AM-11PM UTC weekdays
Results:
- Improved runner availability through server type fallbacks
- More reliable Docker setup on new instances
- Better documentation coverage
Server(s): hetzner-gitlab-runners-fleet-01-manager
Sunday 2025.03.09
Responsible: Franco Gilio
Reason:
- Migrate to new Hetzner region because original region incident
Actions:
- Region migration: Updated configuration from previous region to
fsn1(Falkenstein) - Updated documentation: Reflected new region in setup guide
Results:
- Restore availability in new region
Server(s): All GitLab runner instances
Friday 2025.01.24 - Initial Setup
Responsible: Franco Gilio
Reason:
- Initial setup of self-hosted GitLab runners fleet
- Cost optimization and performance improvement over GitLab managed runners
Actions:
- Created GitLab runner configuration: Comprehensive
config.tomlwith autoscaling and Docker executor - Setup documentation: Created detailed setup guide covering architecture, prerequisites, and maintenance
- Monitoring integration: Added Prometheus metrics collection and Grafana dashboard
- Cost analysis: Documented cost benefits vs GitLab managed runners
Results:
- Established self-hosted GitLab runners infrastructure
- Achieved cost savings compared to managed runners
- Improved performance with dedicated resources
- Comprehensive monitoring and documentation
Server(s): hetzner-gitlab-runners-fleet-01-manager (initial deployment)